124 ◾ Bioinformatics
4.2.1.1.9 Filtering variants
Once variants have been called and identified, it is important to filter out low-quality vari-
ants before annotation and interpretation. The traditional variant filtering performs filter-
ing by checking QUAL field and some annotations listed in INFO and FORMAT fields of
a VCF file such as MQ (Mapping quality), DP (read depth), FS (FisherStrand), and SOR
(StrandOddsRatio) and choosing threshold values to filter out the variants that do not meet
the criteria. The base call quality (QUAL) is the most used filter parameter. The traditional
VCF filtering is performed with bcftools which uses either “bcftools filter” or “bcftools
view” with options “-i” and “-e” to filter-in or filter-out variants. The FILTER field of the
VCF file is empty “.” before forcing any filter as shown in Figure 4.4. Once a filter has been
applied, a variant in the VCF file either passes or fails. If it passes, PASS will be added to
FILTER field and any other variant that does not meet the criterion will be filtered out leav-
ing only the ones with PASS (see Figure 4.5). The following command filters in the variants
with Phred quality scores greater than 60:
bcftools filter -O z \
-o filtered_sarscov2.vcf.gz \
-i ‘%QUAL>60’ sarscov2.vcf.gz
The same results can be obtained with the following command, which filters out variants
with quality score less than 60:
bcftools view -O z \
-o filtered_sarscov2.vcf.gz \
-e ‘QUAL<=60’ sarscov2.vcf.gz
However, we can also filter variants based on other criteria using the statistics or annota-
tions in INFO or FORMAT fields. For example, you may decide to filter out variants with
depth less than 300; thus, you can use the following command:
FIGURE 4.4 VCF file displayed in a spreadsheet.